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N.  Abstract 

Major  efforts  of  the  project  fall  into  four  categories!  , 

1)  Investigations  were  performed  on  the  relationship  between  the 
dimensionality  of  a  dataset  and  its  underlying  cognitive  processes. v  The 
datasets  represent  the  computational  arithmetic  domains  of  addition  and 
subtraction  of  signed  numbers  and  fractions.  Error  diagnostic  computer 
programs  for  these  computational  skills  written  on  the  PLATOR  system 
were  used  for  examining  each  student's  procedural  rule.  Various 
analyses  imply  that  the  systematic  application  of  erroneous  rules  by 
many  students  causes  multidiraensionality  ^of  the  data. 

-V 

2)  Two  approaches  for  diagnosing  erroneous  rules  of  operation  were 
developed;  an  ‘error  vector*  system  for  constructing  error  diagnostic 
programs  for  signed-number  arithmetic  and  fraction  addition  problems,  and 
a  series  of  logical  statements  for  constructing  diagnostic  programs  for 
fraction  problems*.  A  series  of  experimental  data  collected  between 
1979-1982  revealed  jthat  the  rate  of  diagnosing  erroneous  rules  by  these 
deterministic  approaches  becomes  very  low  (about  50%)  when  learning  is 
most  active.  Hence,  it  is  impossible  to  help  students  with  prescriptive 
information  from  these  error  diagnostic  systems.  Moreover,  developing 
such  computer  programs  in  general  areas  will  be  painstaking  and 
time-consuming. 

'V' 

3)  To  circumvent  the  problems  encountered  in  the  construction  of 
error  diagnostic  programs,  two  indices  based  on  deterministic  Guttman^ 
theory  were  formed  and  used  to  detect  aberrant  response  patterns. ^  The  ' 
first  of  these  indices  was  useful  in  categorizing  erroneous  rule^  into 
serious  or  less-serious  errors.  The  second  index  proved  to  be  very 
powerful  for  detecting  erroneous  rules  resulting  from  the  students' 
misconceptions.  In  several  different  datasets  of  arithmetic  computations, 
the  detection  rates  were  always  higher  than  95%  of  the  erroneous  rules 
which  had  been  diagnosed  separately  by  the  error  vector  system. 

V 

4)  The  necessity  for  dealing  quantitatively  with  variations  in 
errors  and  changing  rules  of  operation  led  to  the  investigation  of 
probabalistic  models  for  error  diagnosis  based  on  item  response  theory. ■<- 

A  group  of  extended  caution  indices  was  formulated.  These  indices  have  ^ 
a  prominent  mathematical  feature  and  some  functional  similarities  to  both 
of  the  other  indices;  however,  used  traditionally,  their  detection  rate 
for  some  of  the  most  frequent  erroneous  rules  is  unexpectedly  low. 

As  an  alternative  approach,  the  concept  of  "rule  space"  was  developed. 

All  responses,  both  correct  and  erroneous,  are  decomposed  into  components, 
which  are  mapped  into  a  vector  space  spanned  by  the  true  scores  and  one 
of  the  standardized  extended  caution  indices  (ECI4z).  A  pattern  class¬ 
ification  technique  is  used  to  separate  each  rule  from  its  neighboring 
points  in  the  rule  space.  Since  the  ECi4z  is  a  continuous  function,  those 
points  which  plot  close  to  a  rule  represent  responses  yielded  by  "slips" 
or  random  errors,  or  by  imperfect  applications  of  the  rule. 
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DIMENSIONALITY 

One  of  the  first  goals  of  the  project  was  to  address  the  problem  of 
multidimensionality  of  achievement  test  data.  Latent  trait  theory  provides 
a  potentially  powerful  tool  for  locating  a  person's  ability  or  achievement 
level  within  a  hierarchical  set  of  test  items;  however,  latent  trait  models 
require  that  test  data  be  unidimensional  (i.e.,  that  they  measure  a  single 
trait).  On  the  other  hand,  achievement  test  data  usually  are  multidimensional, 
so  that  difficulties  were  anticipated  in  the  application  of  latent  trait 
theory  to  the  diagnosis  of  student  errors  on  achievement  tests. 

Several  datasets  were  collected  from  seventh  and  eighth  grade  students 
taking  computerized  tests  on  the  PLATO®  system  during  1979  -  1980,  while  five 
datasets  were  simulated  on  the  PLATO®  computer.  All  datasets  were  based  on 
test  results  within  the  domain  of  signed-number  arithmetic.  Tatsuoka,  et 
al.  (Tatsuoka  &  Baillie,  1982a;  Tatsuoka,  et  al. ,  1982)  developed 
"SIGNBUG",  a  set  of  computer  programs  on  the  PLATO®  system  which  analyze 
each  student's  procedural  rules  for  solving  signed-number  arithmetic 
problems. 

These  data  were  used  in  several  experiments  which  investigated  the 
relationship  between  the  dimensionality  of  a  dataset  and  the  cognitive 
processes  which  led  to  the  student's  solution  of  the  problems.  One  study 
examined  the  dimensionality  of  an  achievement  test  across  different 
learning  stages  under  two  different  instructional  methods  (Tatsuoka,  — 

1981).  This  study  demonstrated  that  different  instructional  methods 
affect  the  dimensionality  of  test  scores  to  a  large  extent.  The  results  ^ 

also  indicate  that  in  the  early  stages  of  learning,  students  tend  to  use  ^ 

their  rules  of  operation  inconsistently  during  the  test.  This  causes  a  r 


y*rm\ 


joist  .vvclal 


2 


clear  violation  of  the  local  independence  assumption,  which  is  essential 


to  latent  trait  theory. 


A  second  study  compared  the  dimensionality  of  achievement  test  scores 


based  only  on  correct  answers,  to  scores  based  on  whether  the  student  used 


the  correct  algorithm  (Birenbaum  &  Tatsuoka,  1980,  1983).  Results 


indicate  that  in  achievement  data  based  on  a  specific  arithmetic  problem¬ 


solving  domain,  the  factorial  structure  of  the  data  is  strongly  affected 


when  a  variety  of  different  algorithms  underlie  the  student  responses. 


with  a  resulting  increase  in  the  dimensionality  of  the  data.  The  fact 


that  students  may  get  right  answers  by  following  a  wrong  rule  is  reflected 


in  the  psychometric  properties  of  the  test.  When  the  conventional  scoring 


system  is  used,  it  results  in  negative  correlations  among  some  items,  and 


increased  dimensionality;  when  the  scoring  system  takes  into  consideration 


the  thought  processes  of  the  student,  there  is  a  reduction  in 


dimensionality  and  considerably  higher  correlations  among  the  tasks. 


without  changing  their  mean  values. 


The  simulated  data  referred  to  above  were  used  as  a  means  to  control 


the  number  of  algorithms  underlying  the  responses,  in  order  to  study  the 


effect  on  dimensionality  of  the  test  data  when  the  number  of  algorithms 


Increases  (Birenbaum  &  Tatsuoka,  1982).  This  simulation  was  meant  to 


describe  a  situation  in  which  25%  of  the  subjects  knew  nothing  about  the 


topic  being  tested,  merely  guessing  randomly  for  the  answers,  while 


another  25%  had  mastered  the  tasks  and  answered  all  of  the  items 


correctly.  The  remaining  half  of  the  subjects  were  presumed  to  have 


mastered  incorrect  rules.  The  number  of  Incorrect  algorithms  was 


Increased  with  each  successive  dataset;  one  in  the  first,  two  in  the 
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second,  up  to  five  In  the  fifth,  distributed  in  each  set  In  equal 
proportions.  In  order  to  make  the  effect  of  the  algorithms  clearer,  a 
hypothetical  situation  was  simulated  in  which  75%  of  the  responses  were 
consistent,  i.e.,  each  subject  used  the  same  rule  consistently 
throughout  the  test.  A  principal  components  analysis  of  these  data 
demonstrated  that  an  increase  in  the  number  of  wrong  algorithms  results 
in  a  decrease  in  both  the  reliability  coefficients  and  in  the  amount  of 
variance  accounted  for  by  the  first  factor.  A  similar  analysis  of  real 
data  collected  before  and  after  instruction,  as  well  as  for  two  kinds  of 
instruction,  indicated  less  heterogeneity  of  the  underlying  algorithms 
(i.e.,  fewer  wrong  rules)  after  instruction  than  before,  regardless  of 
the  kind  of  instruction. 

Two  mathematical  methods  for  extracting  unidimensional  subsets 
from  multidimensional  datasets  were  investigated.  An  algorithm  based 
on  graph  theory  efficiently  extracted  nonredundant  chains  of  items  using 
a  series  of  matrix  manipulations  performed  on  the  dominance  matrix 
(Yamamoto  &  Wise,  1980),  and  an  order-analysis  procedure  was  used 
successfully  to  isolate  uni-dimensional  item  subsets  in  both  real  and 
simulated  data  (Wise,  1981). 

ERROR  DIAGNOSIS 

Two  methods  were  developed  for  diagnosing  erroneous  rules  of  operation. 
In  the  first  of  these,  a  system  of  binary  error  vectors  was  generated  from 
item  responses  (Tatsuoka,  et  al.,  1980).  Operations  in  signed  number 
addition  were  decomposed  into  sign  and  absolute  value  components,  and  each 
component  was  represented  by  a  vector  of  binary  numbers  accounting  for  all 


possible  operations  for  doing  the  problem.  By  a  process  of  elementwise 
multiplication  of  the  set  of  error  vectors,  a  particular  wrong  rule  can  be 
determined  uniquely,  provided  the  student  consistently  uses  that  rule. 

The  second  method  for  diagnosing  erroneous  rules  consisted  in  the 
derivation  of  a  series  of  logical  statements  to  be  used  for  constructing 
error-diagnostic  programs.  Klein,  et  al.  (1981),)  described  and 
illustrated  a  procedure  for  constructing  error-diagnostic  items  for 
addition  and  subtraction  of  fractions,  based  on  a  procedural  network. 

This  approach  is  too  complex  to  be  practical,  and  the  need  for  defining  a 
hierarchy  of  item  difficulty  was  recognized. 

Tatsuoka  and  Tatsuoka  (1981b)  described  a  system  of  order  analysis  that 
was  developed  by  Takeya  (1981),  called  item  relation  structure  analysis, 
and  used  it  for  examining  the  structural  relations  among  a  set  of  24  items 
in  addition  and  subtraction  of  fractions.  The  goal  was  to  devise  a  technique 
for  investigating  the  item  structure  with  respect  to  the  roles  of  each  item 
in  determining  the  student's  misconceptions.  Results  were  inconclusive, 
though  promising,  and  the  need  for  further  study  was  indicated. 

Standiford,  et  al.  (1982),  described  a  procedural  network  for  solving 
problems  in  decimal  fraction  addition  and  subtraction,  and  compared  the 
item  dominance  predicted  from  this  network  with  that  predicted  from  the 
item  relations  structure  analysis  model.  In  general,  the  latter  model 
confirmed  item  dominance  patterns  predicted  by  the  former. 

Chevalaz  and  Tatsuoka  (1983)  described  and  compared  two  order  analytic 
techniques  for  analyzing  the  structure  of  a  test.  Ordering  theory  of  Krus 
and  Bart  (1974),  and  the  item  relation  structure  analysis  method  proposed 
by  Tatsuoka  and  Tatsuoka  (1981b)  were  used  to  extract  the  hierarchical 


item  structure  from  three  datasets.  It  was  found  that  the  Krus  and  Bart 
procedure  more  adequately  represented  the  complex  interrelationships  among 
test  data,  but  that  use  of  the  item  relation  structure  analysis  appears 
to  be  more  appropriate  when  the  data  contains  many  errors. 

As  part  of  the  effort  to  identify  and  catalog  specific  erroneous  rules 
Shaw,  et  al.  (1982),  analyzed  results  of  a  written  test  in  fraction 
addition,  and  interviewed  many  of  the  students.  Their  report  describes 
the  test  performance  of  26  students  who  displayed  a  variety  of  erroneous 
rules.  The  cases  were  selected  for  their  potential  usefulness  in 
designing  and  implementing  an  error-diagnostic  testing  system  and  in 
designing  appropriate  remediation. 

Tatsuoka  (1981)  attempted  to  quantify  the  relative  seriousness  of 
errors  in  signed  number  addition  problems.  All  component  procedures  for 
carrying  out  the  addition  problems  properly  were  expressed  by  a 
hierarchical  tree.  Then,  each  erroneous  rule  was  characterized  by 
assigning  two  quantities,  representing  what  and  how  many  steps  were 
followed  to  produce  the  responses.  If  a  rule  were  the  result  of  a 
misconception  at  an  earlier  level  in  the  network,  then  it  was  more 
likely  committed  by  students  in  the  early  stage  of  learning,  or  by  lower 
ability  students.  For  students  nearing  mastery,  any  erroneous  rules 
would  be  due  to  mistakes  from  the  latter  part  of  the  procedural  network. 

A  procedural  steps  conformity  index  was  designed  to  express  quantitatively 
both  single  and  compound  error  sources.  The  need  for  further  work  in 
generalizing  these  procedures  was  recognized. 

Using  the  same  error  classification  system,  Tatsuoka  (1984)  divided 
27  erroneous  rules  of  signed-number  addition  problems  Into  two  groups. 


non-serious  (A)  and  serious  (B)  error  types,  in  order  to  investigate 
changes  over  time  in  their  rate  of  incidence.  Forty-five  subjects  from 
junior  high  school  took  a  test  in  signed-number  addition,  which  was 
administered  six  times  at  various  stages  of  instruction  over  a  period 
of  a  year  and  a  half.  Those  students  whose  verbal  ability  as  measured 
by  the  Stanford  Verbal  Test  fell  in  the  top  16Z  also  were  identified. 
Results  showed  (1)  use  of  the  right  rule  decreased  over  the  first  three 
tests,  then  increased  dramatically;  (2)  use  of  A-type  rules  did  not  change 
much,  while  use  of  B-type  rules  decreased  slowly  over  time;  (3)  on  the 
second  test  students  with  high  verbal  ability  used  A-type  rules  much  more 
than  the  more  serious  B-type,  but  the  reverse  was  true  for  the  other 
students.  The  latter  finding  suggests  that  students  with  high  verbal 
ability  may  be  less  likely  to  adopt  the  more  serious  error  types  in  the 
early  stages  of  learning. 

Tatsuoka  and  Blrenbaum  (1981)  reported  the  observed  effects  on  test 
performance  resulting  from  differences  in  instructional  backgrounds.  An 
adaptive  diagnostic  test  was  used  as  an  integral  part  of  an  instructional 
program  in  signed-number  arithmetic  on  the  PLATO**  system.  The  testing 
procedure  worked  well  for  most  examinees,  but  not  for  those  who  had  been 
exposed  to  a  different  conceptual  framework  prior  to  the  PLATO**  instruction 
Differences  in  prior  and  subsequent  instructional  methods  affected  the 
learning  of  more  advanced  materials  and  produced  lower  achievement  scores 
on  the  posttest  given  at  the  end  of  the  program.  These  results  present 
a  serious  problem  when  students  are  to  be  routed  to  an  instructional  level 
based  only  on  performance  on  a  diagnostic  test.  It  is  important  to 
examine  the  conceptual  basis  for  both  stages  of  instruction  and  to  route 


each  examinee  accordingly 


The  two  methods  for  diagnosing  errors  referred  to  above  are  both 


deterministic;  therefore,  their  rate  of  diagnosis  diminishes  if  students 
apply  their  rules  inconsistently.  Analysis  of  data  gathered  during  1979- 
1980  (Birenbaum  &  Tatsuoka,  1981;  Tatsuoka,  1983a)  confirmed  that  (1) 
students  tend  to  change  their  rules  of  operation  most  during  the  early 
stages  of  learning,  and  (2)  the  rate  of  diagnosis  decreases  accordingly, 
to  as  low  as  50%.  Since  this  is  precisely  the  learning  stage  when 
diagnosis  is  most  needed,  it  is  impossible  to  help  students  with 
prescriptive  information  from  these  error-diagnostic  systems. 

Furthermore,  the  creation  of  computer  programs  for  error  vector  systems 
and  process  networks  is  too  complex  and  time-consuming  for  general 
applications. 


NORM  CONFORMITY  AND  INDIVIDUAL  CONSISTENCY  INDICES 
Since  the  usefulness  of  the  error  diagnostic  programs  is  seriously 
limited  when  students  change  their  rules  of  operation,  a  method  for 
detecting,  such  changes  seemed  useful.  Accordingly,  two  indices  were 
developed  for  measuring  the  degree  of  conformity  or  consistency  of  an 
individual  examinee's  response  pattern  on  a  set  of  items  (Tatsuoka  & 
Tatsuoka,  1980;  1981a;  1982a;  1983).  The  first,  called  the  norm  conformity 
index  (NCI),  measures  the  proximity  of  the  pattern  to  a  baseline  pattern 
in  which  all  0's  precede  all  l's  when  the  items  are  arranged  in  some 
prescribed  order.  The  second,  called  the  individual  consistency  index 
(ICI),  measures  the  extent  to  which  an  individual's  response  pattern 
remains  invariant  when  he  or  she  responds  to  several  waves  of  parallel 
items.  Both  of  these  indices  were  developed  originally  as  potential  tools 


for  assisting  in  the  extraction  of  subsamples  of  examinees  for  whom  the 
data  are  uni-dimensional  or  nearly  so. 

The  NCI  is  a  sort  of  backward  extension  to  the  individual  level  of 
one  of  Cliff's  group  consistency  indices.  Its  calculation  requires  that 
the  test  items  be  rearranged  in  the  order  of  difficulty  for  some  particular 
group.  The  NCI  turned  out  not  to  be  very  useful  for  the  originally 
intended  purpose  of  extracting  unidimensional  subgroups.  Rather,  it  was 
found  to  be  more  useful  in  highlighting  the  different  response  patterns 
that  are  typical  of  individuals  with  different  instructional  backgrounds, 
and  in  categorizing  erroneous  rules  by  degree  of  seriousness. 

The  ICI  depends  on  the  task  difficulties  as  determined  by  an  individual 
student's  state  of  knowledge.  Its  definition  calls  for  the  existence  of 
two  or  more  parallel  subtests.  Calculation  of  the  ICI  is  the  same  as 
that  for  the  NCI,  except  that  the  items  are  arranged  in  the  order  of 
difficulty  of  the  skill  types  for  the  particular  individual  instead  of 
the  order  of  difficulty  for  a  group.  The  unique  feature  of  the  ICI  is 
that  its  values  are  individually  oriented  and  free  from  group  dependence. 

The  ICI  was  found  to  be  quite  useful  for  identifying  individuals 
who  could  be  removed  from  a  sample  to  improve  the  approximation  to 
unidimensionality  exhibited  by  the  data  matrix  of  the  remaining  group. 

The  ICI  value  is  large  when  an  individual  responds  to  similar  items 
in  the  same  way.  A  small  ICI  value  indicates  uncertain  or  random 
responses.  A  combination  of  high  ICI  and  low  total  score  indicates 
consistent  errors,  while  a  combination  of  low  ICI  and  low  total  score 
suggests  that  the  student  does  not  have  a  clear  method  for  proceeding 
and  is  answering  at  random  or  by  trial  and  error. 


Application  of  the  ICI,  together  with  the  total  scores,  to  several 
of  the  signed  number  arithmetic  datasets  detected  most  (over  95%)  of  the 
erroneous  rules  which  had  been  detected  separately  by  the  error  vector 
diagnostic  system. 

Although  the  ICI  is  useful  in  detecting  aberrant  response  patterns 
resulting  from  the  use  of  wrong  algorithms,  it  requires  repeated 
measures  in  a  test.  Therefore,  the  Index  is  not  applicable  to  many 
commercial  achievement  tests  or  to  criterion  referenced  tests  designed 
to  measure  the  outcome  of  treatments  in  a  wide  range  of  content  areas. 
However,  when  tests  are  aimed  at  assessing  the  progress  of  a  student's 
learning  and  used  as  an  integral  part  of  instruction,  the  information 
obtained  from  the  ICI  will  be  useful  for  assessing  how  well  the  student 
understands  the  subject. 

EXTENDED  CAUTION  INDICES  AND  RULE  SPACE 

The  necessity  for  dealing  quantitatively  with  variations  in  errors 
and  changing  rules  of  operation  led  to  the  investigation  of  probabalistic 
models  for  error  diagnosis  based  on  item  response  theory.  Indices  of  the 
degree  to  which  an  individual's  pattern  of  responses  is  unusual  were 
classified  into  two  general  types:  (1)  those  that  use  item  response  theory 
and  (2)  those  that  rely  on  observed  item  responses  and  standard  summary 
statistics  based  on  those  responses.  Tatsuoka  and  Linn  (1981,  1983)  demon¬ 
strated  a  link  between  these  two  approaches  by  showing  a  correspondence 
between  the  S-P  curve  theory  developed  by  Sato,  and  test  response  curves 
and  group  response  curves  developed  from  item  response  theory.  Furthermore 
the  caution  index  defined  in  Sato's  S-P  curve  theory,  which  is  based  on 


a  comparison  of  observed  item  responses  to  group  responses,  was  extended 
to  theory-based  estimates  of  person  and  group  response  probabilities. 

That  Is,  S-P  curve  theory  and  the  caution  index,  which  originally  were 
developed  within  a  discrete  domain  of  0-1  scoring,  were  extended  to  a 
more  general  case  of  probabilities. 

Five  extended  caution  indices  were  defined,  designated  respectively  the 
ECU,  ECI2,  ECI3,  ECI4,  and  ECI5.  These  indices  are  linear  transformations 
of  the  covariance  of  a  person's  response  pattern  with  one  of  two  theoretical 
curves  computed  ur^ng  item  response  theory  (i.e.,  the  group  response  curve 
for  the  ECI1,  EC12  and  ECI3,  or  the  person  response  curve  for  the  ECI4 
and  ECI5). 

The  ECI4  is  similar  to  the  individual  consistency  index,  or  ICI  (see 
above).  The  ICI  was  shown  to  be  useful  in  detecting  a  variety  of  erroneous 
rules  of  operation  with  signed-number  addition  and  subtraction  problems, 
but  its  application  is  limited  because  it  requires  repeated  measures  within 
a  test.  The  ECI4  not  only  avoids  the  repeated  measures  limitation  but  it 
also  is  effective  for  identifying  persons  who  consistently  use  an  erroneous 
rule  in  answering  signed-number  arithmetic  problems.  Based  on  its 
application  to  a  set  of  achievement  test  data,  the  ECI4  distinguishes 
persons  who  are  consistently  using  erroneous  rules  from  those  who  are 
not,  provided  that  these  erroneous  rules  are  not  popular  in  the  data  used 
for  estimating  item  and  person  parameters.  Therefore,  selection  of  the 
correct  data  set  for  estimating  these  paramaters  is  very  important  when 
applying  these  indices. 

Tatsuoka  and  Tatsuoka  (1982a)  investigated  the  statistical  properties 
of  the  ECU,  ECI2  and  ECI4.  They  found  that  both  the  ECU  and  ECI2  have 


the  constant  expectation  of  zero,  regardless  of  the  level  of  the  person 
parameter  0£,  while  the  expectation  of  the  ECI4  Is  a  function  of  0j_. 


As  was  shown  with  data  from  a  40-item  signed  number  subtraction  test, 
the  conditional  variances  of  the  three  ECIs  under  consideration  have 
U-shapes,  with  inflated  values  at  both  the  extremely  high  and  extremely 
low  true  scores  and  fairly  constant  values  in  between.  In  order  to  avoid 
this  weakness,  the  ECU,  ECI2,  and  ECI4  were  standardized  by  subtracting  the 
conditional  expectation  of  each  ECI  from  the  original  ECI  and  dividing  by 
the  square  root  of  its  conditional  variance  (Harnlsch  &  Tatsuoka,  1983). 
Goodness-of-f it  tests  of  the  standardized  ECI's  showed  that  they  fit 
normal  distributions  well. 

Since  all  of  the  extended  caution  indices  are  based  on  conditional 
probability  of  0^,  they  do  not  allow  a  fair  comparison  of  two  values  if 
they  are  obtained  from  examinees  at  two  different  ability  levels.  However, 
since  the  standardized  ECI's  do  not  depend  on  0^,  two  standardized  ECI 
values  obtained  from  different  0*  values  are  comparable  in  terms  of  the 
extent  of  anomaly  they  signify. 

The  use  of  the  various  ECI's  for  detection  of  erroneous  rules  proved 
to  be  unexpectedly  low  in  all  cases  (about  60%).  Although  the  reasons 
for  this  are  not  entirely  clear,  it  appears  that  if  an  otherwise  normal 
dataset  includes  a  considerable  number  of  aberrant  response  patterns, 
then  such  patterns  are  no  longer  detectable  with  high  probably  by  the 
traditional  use  of  these  indices.  Investigation  of  an  alternative  approach 
therefore  was  necessary. 

A  probabalistlc  model  was  developed,  called  "rule  space,"  in  which 
all  responses,  both  correct  and  erroneous,  are  decomposed  into  component 


parts  and  mapped  as  points  in  a  geometric  space  (Tatsuoka,  1983a,  1983b; 
Tatsuoka  &  Baillle,  1982b).  Rule  space  is  defined  as  the  cartesian  product 
of  the  estimated  true  scores  and  the  values  of  the  standardized  extended 
caution  index  ECl4z  (Tatsuoka  &  Baillle,  1982b).  In  other  words,  rule 
space  is  a  geometric  representation  of  the  rules  used  by  the  student.  In 
this  space,  the  erroneous  rules  resulting  from  the  same  kind  of 
misconception  cluster  closely,  as  was  confirmed  by  results  plotted  from 
several  datasets. 

The  advantage  of  using  the  standardized  extended  caution  index  ECI4z 
is  its  effectiveness  for  separating  clusters  of  responses  from  one  another. 
If  two  response  patterns  from  the  same  ©  level  differ  they  will  be  plotted 
at  different  locations  in  the  rule  space.  Furthermore,  the  degree  of 
unusualness  of  a  response  is  represented  by  its  distance  from  the  true- 
score  axis.  A  cluster  of  response  patterns  consists  of  the  response 
pattern  yielded  by  some  rule  and  its  "slips,"  due  to  partially  consistent 
application  of  the  rule.  Using  pattern  classification  to  separate  the 
clusters  in  the  rule  space  accounts  for  variability  of  errors  in  the 
model  (Tatsuoka,  1982b).  By  calculating  a  set  of  linear  classification 
functions  of  the  various  clusters  and  by  setting  boundaries  to  divide  the 
regions,  it  is  possible  to  identify  the  underlying  misconception  of  a  new 
response  with  some  probability  of  error  by  examining  in  which  region  the 
new  response  falls.  Thus,  the  problem  of  diagnosing  an  individual 
student's  misconceptions  has  been  transmuted  into  a  classification 
problem.  Using  the  probabilistic  approach  of  rule  space  and  pattern 
classification  for  the  diagnosing  of  errors  promises  to  remedy  the 
weaknesses  of  deterministic  methods  without  losing  their  strengths. 
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